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Abstract 

Being able to forcast extreme volatility is a central issue in financial risk management. We present a large volatility 
predicting method based on the distribution of recurrence intervals between volatilities exceeding a certain thresh¬ 
old Q for a fixed expected recurrence time tq. We hnd that the recurrence intervals are well approximated by the 
^'-exponential distribution for all stocks and all tq values. Thus a analytical formula for determining the hazard prob¬ 
ability lV(Af|f) that a volatility above Q will occur within a short interval Af if the last volatility exceeding Q happened 
t periods ago can be directly derived from the ^-exponential distribution, which is found to be in good agreement with 
the empirical hazard probability from real stock data. Using these results, we adopt a decision-making algorithm for 
triggering the alarm of the occurrence of the next volatility above Q based on the hazard probability. Using a “receiver 
operator characteristic” (ROC) analysis, we hnd that this predicting method efficiently forecasts the occurrance of 
large volatility events in real stock data. Our analysis may help us better understand reoccurring large volatilities and 
more accurately quantify hnancial risks in stock markets. JEL classification: C14 

Keywords: Extreme volatility. Risk estimation. Recurrence interval; Large volatility forecasting; Distribution; 
Hazard probability 


1. Introduction 

Predicting extreme volatility events in hnancial markets is essential when estimating risk. A standard approach to 
extreme event prediction is to hnd the precursory patterns mor to an extreme event or to quantify the probability that 
a given pattern is a precursor to an extreme event Qll. ll, 01 propose a new method based on the statistics of the 
recurrence intervals between events exceeding a threshold to determine the risk probability W(Af|f) that an extreme 
event will occur within the next Af intervals when the last extreme event occurred t periods ago. They hnd that 
when examining real market data and model data with a low level of noise the predicting method based on recurrence 
interval analysis produces a better performance forecast than the method based on precursor pattern recognition Hi. 

Understanding of the recurrence interval, dehned as the waiting time between consecutive events with values 
greater than a predehned threshold Q, is essential in uncovering the underlying laws governing extreme events in 
many helds. Recurrent interval analysis has been carried out on maw kinds of time series in predicting the probability 
that an extreme event will occur, including records of climate 100, seismic activities |0, energy dissipation rates 
of three-dimensional turbulence |0, heartbeat intervals in medical science 10, precipitation and river runoff lO, 
internet traffic ifTH fl0 . hnancial volatilities ifisl fT3] . equity returns llsl fTsl - Eill . and trading volumes ll23l42^ . An 
improved method of estimating value at risk (VaR) in hnancial markets has been proposed based on the recurrence 
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interval between the last two returns below -Q. This method is significantly more accurate than traditional estimates 
based on the overall or local return distributions §111. 

To accurately estimate the risk probability and the VaR based on recurrence interval analysis we need the set of 
distribution and memory behavior of the recurrence time between extreme events. It is found that the recurrence 
intervals of the time series in many different fields exhibit fat-tailed distributions and long- and short-range memories, 
indicating that extreme events are not described by the Poisson process. Unlike long- and short-range memory behav¬ 
iors, which are easily testable using conditional distribution analysis and the DFA method, the distribution form of 
recurrence intervals is still elusive. For example, in financial markets the recurrence intervals of different data types 
(return, volatility, and trading volume), different data resolutions (minute-by-minute and daily), and different markets 
fit different distributions, including power-law, stretched exponential, and ^'-exponential. It has been found that the re¬ 
currence interval distribution above a fixed threshold has a power-law tail for the daOy volatilities in Japanese market 
113, 2^, the minute-by-minute volatilities in Korean market ll^ and Italian market lE^ . the daily returns in US stock 


markets § 3 3]. the minute-by-minute returns in Chinese markets |18], and the minute-by-minute trading volume 


in US markets Il25h and Chinese markets 112^. A number of studies ranging from daily to high-frequency data and 
from developed to emerging markets Q I29l436l] . have also reported that the distribution of the recurrence intervals 
of financial volatility is a stretched exponential. Reference 12211 reports that in Chinese markets the recutTence time 
between returns above a given positive threshold or below a negative threshold for the index spot and futures fits a 
stretched exponential distribution. The recurrence intervals between losses in financial markets were recently found 
to fit a ^'-exponential distribution El 

In this paper we describe the datasets in Sec.|2] present the theoretical framework for predicting large volatilities in 
Sec.[3j determine the distribution of the recurrence intervals between large volatilities in Sec.lH and report the hazard 
probability results and predicting algorithm performance in Sec.|5] In Sec.we summarize our findings. 


2. Data description 

To carry out a detailed recurrence interval analysis of Chinese stock markets, we include as many Chinese stocks in 
our analyzing sample as possible. The minute-by-minute price data of all stocks in the Chinese markets are extracted 
from the RESSET financial database. The extracting period is from 26 July 1999 to 30 December 2011, which is the 
maximum spanning period allowed in the RESSET database. To ensure that the recutTent interval results between the 
top 1 % volatilities will have more than 1,000 data points, we select only those stocks that have a minimum of two years 
of trading records. Having this large a sample size lowers the etTor rate when we use a maximum likelihood estimation 
to fit the distributions. Einally, we have 1891 stocks in our sample, which include 853 A-shares, 54 B-shares, and 63 
ChiNext shares in the Shenzhen market, and 867 A-shares and 54 B-shares in the Shanghai market. 


3. Framework of predicting large volatilities 


3.1. Hazard probability VT(Af|f) 

We use the hazard probability VT(Af|f) to forecast the occurrance of large volatility events. The W{At\t) is the 
probability that there will be additional waiting time Af before another large volatility event occurs when the previous 
large volatility event occurred t time ago. This probability is the key early-warning measurement for the occurrence 
of extreme volatilities. The early warning is triggered when the probability lT(Af|f) is greater than a predefined alarm 
threshold. We can theoretically derive this hazard probability if we have the distribution of the time intervals between 
consecutive extreme volatilities, which are defined as the volatilities that exceed a given threshold Q. 

Using the probability density p{t) of the recurrence intervals between the extreme volatilities, if the time elapsed 
since the last extreme event is f, we want to determine the probability density function p(Af|f) that quantifies the 
additional waiting time Af until the next extreme event. Using the Bayes theorem for conditional probabilities, the 
probability that an event A occurs, given the knowledge of an event B, is simply the quotient of the probability of the 
event A without constraint and the probability of event B ifJ^ . 


p{,A\B) = 


p{AB) 

P(B) ’ 


( 1 ) 
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where p(AB) - p(t + Af) is the probability that no event occurs from 0 to f and that an event occurs at f + At, and 

X OO 

p(s)ds is the probability that no event occurs from 0 to t. Thus we have 


p{At\t) = 


pit + At) 
j"” p(s)ds 


( 2 ) 


Thus the hazard probability WiAt, t) that an extreme event will occur after a short time At <K t since the occurrance of 
the previous extreme event can be expressed 


W(At\t) = 


rt+At ^ ^ , 

J, P(t)dt 

pOC 

J, P(t)dt 


(3) 


3.2. Predicting algorithm 

For a given distribution pit) of the recurrence intervals between extreme volatilities, the formula for lT(Af|f) can 
be obtained using equation (O. If Af = 1 for VT(Af|f), the hazard probability and a decision-making algorithm can 
be used to predict large volatilities 101 ■ To trigger an early warning that a large volatility is about to occur, we set 
a threshold Qp for the hazard probability. When the hazard probability exceeds Qp, an alarm that a large volatility 
will occur during the next time point is activated. We next estimate the Qp parameter, which is the maximum correct 
prediction rate when the maximum false-alarm tolerance is set. 

To determine Qp we estimate the correct prediction and false alarm rates for each Qp in the range of [0,1]. We 
then plot the correct prediction rate with respect to the false alarm rate and get the “receiver operator characteristic” 
(ROC) curve Islldllol llir . The ROC curve is used to quantify prediction efficiency. The satisfied Qp corresponds to 
the point at which the false alarm rate equals the tolerant alarm level on the ROC curve. 

To estimate the correct prediction and false alarm rates we generate for a given Qp two forecasting signals— 
alarms and non-alarms—at each time point. By comparing the forecasting signals with the real data, we obtain one 
of four outcomes at each time point |01, (i) a correct prediction of a large volatility event, (ii) a correct prediction of a 
non-large volatility event, (iii) a missed event, and (iv) a false alarm. By recording in our testing records how many 
times each outcome occurs we can estimate the correct prediction rate D and the false alarm rate A using 

D = A = (4) 

Om + On’ 6>oo + Oio’ 

where On is the number of large volatility events that are correctly predicted, Oqq the number of non-large volatility 
events that are correctly predicted, Oqi the number of missed events, and Oio the number of false alarms. All possible 
pairs of (D, A) will be obtained if we vary the Qp range from 0 to 1. 

By definition the ROC curve will be D - A - 1 if Qp = 0 and D - A - 0 if Qp = 1. Note that the ROC curve 
joins the point (0,0) in the left bottom corner to the point (1,1) in the right top corner. Note also that, for the random 
guess outcome, D = A, a straight line between the two corners. This occurs when there is no memory in the data. For 
a fixed value of A, the larger the value of the correct prediction rate, the better this algorithm performs. 


4. Distribution of recurrence intervals between large volatilities 

4.L Definition of volatilities and recurrence intervals 

For a given minute-by-minute price series pit), the minute-by-minute volatility w(f) can be estimated using 

oJit) ^\\n pit) -in pit-1)\. (5) 

In order to eliminate the influence of the daily periodic patterns, we remove the intraday patterns from the volatility 
series w(f) on each trading day, 

w'(i) = w(s)/A(i), (6) 
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where A(s) = Yjf ^)/N. Here a>(i, s) represents the volatility at time s on day i. The normalized volatility series 
v(f) is then obtained by dividing ai'(t) by its standard deviation, 


v(f) = 




(7) 


The focus of our study is the recurrence interval between the normalized volatilities exceeding a predefined thresh¬ 
old Q. To compare the results between different stocks, we quantify Q by its mean recurrence time tq. There is a 
one-to-one correspondence between Q and tq, such that ll23ll 


1 

Tq 



p{v)Av, 


( 8 ) 


where p{y) is the probability distribution of the volatility. Here we restrict tq to a range of [20,100]. This range 
corresponds to the extreme volatilities from a top value of %5 to 1%, which is often considered in the risk estimation. 


4.2. Distribution formula of recurrence intervals 

To analytically determine the hazard probability we find the distribution that best approximates the recurrence 
interval distribution for all the stocks in our sample. We also determine whether the distribution parameters are 
dependent on the mean recurrence interval tq and on whether the market is bear or bull. Previous research has 
indicated that the distribution of recurrence intervals between returns below a negative threshold - Q depen ds only on 
the mean recurrence interval tq, and not on a specific asset or on the time resolution of the data llOi 13711 . We first 
check whether the recurrence time between volatilities above a threshold Q exhibits this behavior. Figure [Tt a) shows 
the probability distribution of the recurrence intervals for 10 randomly chosen stocks. We transform the distribution 
curves of different tq values by a factor to increase visibility. Note that for the same tq value the recurrence time 
distributions of different stocks nearly overlap on the same curve. This means that the return intervals between 
volatilities that exceed a threshold Q may exhibit a universal distribution for different stocks when tq is fixed. We also 
want to know whether the distributions of different tq values share the same pattern. Previous research indicates that 
these distributions are influenced only by the mean recurrence time tq for return recurrence intervals 0371] . Although 
the distributions in Fig.[Tfa) seem to be different for different tq, if we scale the distribution using the mean recurrence 
time Tq the six distributions seem to be parallel [see Fig.lTfb)]. 



Figure 1: (color online). Probability distribution of the recurrence intervals for 10 randomly chosen stocks. For better visibility, the distribution 
curve of tq = 25, 40, 60, 80, and 100 are shifted vertically by a factor of 10, 100, 1000, 10000, and 100000, respectively, (a) Original recurrence 
intervals, (b) Scaled recurrence intervals. 


To quantitatively measure how the recurrence interval distributions vary with the mean interval tq, we need a 
suitable distributional formula for capturing the recurrence interval distribution. Previous research shows that the 
recurrence intervals can be fitted by the stretched distribution |22, U32, 35, 3^, the power-law distribution with 
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an exponential cutoff |3 26-3, the (^-exponential distribution E 13 . The Weibull distribution is usually 


comparable to the ^-exponential distribution 1140114111 . and we use both two-parameter and three-parameter Weibull 
distributions to fit the recurrence intervals in our analysis. The following are five candidate distributions: the stretched 
exponential distribution, 

/7(t) = aexp[-(bTf], (9) 


the power-law distribution with an exponential cutoff, 

pir) - exp(-A:T), 


( 10 ) 


the ^'-exponential distribution, 

piT)^(2-q)A[l+(q-l)ATr^, 

the two-parameter Weibull distribution. 


pir) 

and the three-parameter Weibull distribution, 

P(r) = ^ 



( 11 ) 


( 12 ) 


(13) 


To compare the recurrence interval distributions of different tq values, we normalize the recurrence intervals 
by Tq, i.e., x - tItq- We obtain the five candidate distributions used to fit the normalized recurrence intervals by 
substituting t = xtq into f{x) — p{t)tq, i.e.. 


fix) - uTexpl-ibTQx)'^], 


(14) 


and 


fix) = ctJx * expi-kTQx), 


P(t) = (2 - q){ATQ)[l +iq- 1){Atq)t'\ , 


fix) 


djTQ \dlTQ 


(-1 


exp 


diTQ 


fix)^ 


diTQ 


I x-to/tq \^ ' 
I d/TQ I 


exp 


I x-tqItq )^ 

I diTQ ) 


(15) 

(16) 

(17) 


(18) 


We use the maximum likelihood estimation (MLE) method to estimate the parameters of the five distribution 
parameters. The details of fitting the stretched exponential distribution and the power-law distribution with an ex¬ 
ponential cutoff are presented in the Appendix. Note that in the following analysis we fit only the scaled recurrence 
intervals. 

Figure |2] shows the empirical distributions and the fitting results of the five candidate distributions of the scaled 
recurrence intervals for two stocks, 000001 and 900956. Note that in the central regions of the distributions all five 
candidates agree with the empirical data. Note also that the ^-exponential distribution better fits the distribution tail 
for all Tq than any other candidate distribution. To determine which distribution has the best performance, we utilize 
KS statistics to quantify the agreement between the empirical distribution and the fitting distributions. Figures [Sja) 
and [Sjd) show the KS statistics of the five candidate distributions with respect to the mean recurrence time tq for 
the two stocks. For stock 000001 the ^-exponential distribution outperforms the other distributions and possesses 
the smallest KS statistics for all tq. For stock 900956 the ^'-exponential distribution is best when tq < 60, and the 
stretched exponential distribution is best when tq > 60. Figure [3] shows plots of the characteristic parameters of the 
five candidate distributions with respect to the mean interval tq for (b) stock 000001 and (e) stock 900956. Note that 
all the fitting parameters of the five candidate distributions are independent of the mean recurrence time tq and exhibit 
a horizontal line. This indicates that the distributions of the recurrence intervals are not influenced by the threshold Q 
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Figure 2: (color online). Plots of distribution fits to normalized recurrence intervals for two stocks. The open markers are the empirical distributions 
and the solid lines are the fits to the five candidate distributions. For better visibility, the curves of tq = 25, 40, 60, 80, and 100 are shifted vertically 
by a factor of 10, 100, 1000, 10000, and 100000, respectively, (a) Stock 000001. (b) Stock 900956. 


when Tq is in the [20,100] range. Figures [3lc) and|3lf) show a plot of the fitting parameters Ax, defined as tqA, and 
q' as a function of tq. Note that the fluctuations of Ax are not wide when tq > 40 for stock 000001 and in the whole 
range of tq for stock 900956. These results indicate a scaling behavior in the volatility recurrence intervals under 
different thresholds. 

Our goal is to And a distribution that can approximate the distribution of recurrence time. The best candidate 
distribution will be the one that provides accuracy of fit and ease in estimating the hazard function VT(f|Af). Note 
that the power-law distribution with an exponential cutoff provides the largest body of KS statistics for all the stocks, 
and that the three-parameter Weibull distribution has fewer KS statistics but requires that too many parameters be 
estimated. These two distributions are excluded from the candidate list. Because the ^'-exponential distribution has 
the smallest body of KS statistics for 75.7% of the fits, we use it to capture the distribution of the recurrence intervals. 
Even when the fits are of bodies of KS statistics that are not the smallest, the ^-exponential still provides a good 
approximation of the recurrence time distribution. The ^'-exponential distribution is also highly useful because it 
allows the derivation of the analytical formula of the hazard function lT(f|Af) from the ^-exponential formula. 

4.3. More on distribution parameter behaviors 

For each stock and each tq value we fit the corresponding scaled recurrence time x with the ^'-exponential distribu¬ 
tion and estimate the distribution parameters q and Ax. To quantitatively check the scaling behaviors we verify whether 
the parameters q and Ax are independent of tq for the same stock and whether they are the same across different stocks 
for the same tq. For each stock we linearly regress the fitting parameters q and Ax with respect to tq. Although for q 
vs Tq we And that the maximum absolute slope is 0.006 and the mean slope over all stocks (5.68 + 7.77) x 10 we 
do not observe comparable small slopes for all stocks when fitting Ax vs tq. We And 945 stocks with absolute slopes 
< 0.006, which is the maximum absolute slope of q vs tq. The maximum value of the slope over all stocks is 0.439 
and the mean slope 0.0068 + 0.0213 for Ax vs tq. These results suggest that the parameter q is independent of the 
mean recurrence time tq for all stocks and the parameter Ax is independent of Tq for half of the stocks, which suggests 
that the scaling behavior in the recurrence intervals for different tq values exists in half of the stocks in the Chinese 
markets (945/1891 ^ 50%). 

Since the value of q does not depend on tq, we average the estimated q for different tq values for each stock 
and plot the mean value of q in Fig.UJa). Note the three white areas separated by two shadow areas. The flve areas 
from left to right represent A-shares in the Shenzhen market, B-shares in the Shenzhen market, ChiNext shares in the 
Shenzhen market, A-shares in the Shanghai market, and B-shares in the Shanghai market. Note that there are two 
groups of stocks, one with a relative smaller value of {q) and the other with a much larger value of {q). The group of 
stocks with the smaller {q) have been traded in the market less than three years. The panel in Fig.|4jb) shows a plot 
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Figure 3: (color online). Plots of fitting results for two stocks, (a-c) Stock 000001. (d-f) Stock 900956. (a, d) Plots of the KS statistics with respect 
to the mean interval tq. The KS statistics is used to describe the agreement between the empirical distribution and the fitting distributions, (b, e) 
Comparison of the characteristic fitting pai'ameters for the five candidate disti'ibutions. (c, f) Plots of the fitting parameters Ax = tqA and q with 
respect to tq. 


of the frequency of {q) in which there is a significant peak at {q) » 1.3, which corresponds to the average value of q 
for the stocks in the large q group. Note also that there is a small peak ?A.q ^ 1.1, which is the mean value of q for the 
stocks in the small q group. Because the fitting parameter of half of the stocks in our sample depends on tq, we 
plot the frequency of Aj^ of all stocks for different values of tq [see Fig.HJc)], instead of averaging the Aj^ of different 
Tq for each stock. Note that the distribution curve of A^ displays the same pattern across different values of tq, the 
only difference being thaf the distribution spanning range becomes wider when tq increases. This further indicates 
that the fitting parameter Ax is affected by the mean recurrence time tq. 

In order to determine whether the ^-exponential parameters are influenced by market state, i.e., bull or bear, we 
use a moving window analysis to track the evolution of htting parameters q and Ax- Because stocks with trading 
records shorter than three years have smaller q values, we hx the window size at 48 months and exclude stocks with 
trading periods shorter than 89 months. We also discard first-month trading data from the stocks remaining because 
hrst-month records tend to be partial (i.e., do not span an entire month) and the volatilities for new IPOs excessively 



Figure 4: (color online). Plots of the estimated q and Ax of ^-exponential distribution for all stocks, (a) Plots of the fitting parameter q for 
dilferent stocks. There ai‘e five areas, three white areas are separated by two dai'k areas. For left to right, the first area represents the A-shares in 
Shenzhen market, the second area is the B-shares in Shenzhen market, the third area stands for ChiNext shares in Shenzhen market, the fourth area 
corresponds to the A-shares in Shanghai market, and the fifth area is the B-shares in Shanghai market, (b) Frequency of the fitting parameter q. (c) 
Frequency of the fitting parameter Ax for different values of tq. 
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large. We arrive at 1395 stocks as subject for our rolling window analysis. 




t t 


Figure 5: (color online). Plots of the evolution of the estimated q and Ax of ^-exponential distribution for four chosen stocks. From the bottom 
panel to the up panel, the stock codes are 000001, 200002, 600220, and 900956, respectively, (a) Evolution of q. The dash line represents the mean 
value of q across all windows. The solid line corresponds to the q of the entire period, (b) Evolution of Ax for different values of tq. 


For each stock, we perform the same analysis in each moving window of the series that we perform on the whole 
series. We first fit the recurrence intervals for different values of tq with the ^'-exponential distribution and estimate 
the corresponding distribution parameters q and Ax in each window. We again find that the parameter q is independent 
of Tq in each window for each stock, but the parameter Ax does not exhibit this behavior. We average the q for 
different values of tq in each window and plot the mean g' as a function of the time, which corresponds to the last 
month of each moving window [see Fig. 13 a)]. For the sake of comparison, the estimated q of the entire period is 
a solid line and the mean value of q across all the windows is a dashed line. Note that for the four stocks there is 
a big gap between the solid and dashed line. Note also that the {q) curves also exhibit significant fluctuations along 
the time axis. Figure|3b) shows the evolution of Ax for different values of tq. Note that the trajectory of Ax exhibits 
the same trend as the trajectory of {q). It is not clear whether these trends reflect stock price tends, but our results 
demonstrate that the estimated parameters of the ^'-distribution are influenced by market status and thus may be treated 
as a individual risk factor when explaining market returns. 

5. Results of the large volatility prediction 

5.7. Hazard probability 

The recurrence intervals between volatilities exceeding a threshold Q are well approximated by the ^'-exponential 

_ 1 _ 

distribution, and that gives us the formula of interval distribution p(f) - (2 - q)A[\ + {q - l)/lf] <!-'. By substituting 
this equation into equation Q, we obtain 


W^(Af|f) = 1 - 


1 -H 


(q - l)AAt 
1 H- (^ - l)At 



(19) 


If we designate the top 1% of volatility values (corresponding to the mean recurrence time tq - 100) to be 
extreme events, we can estimate the hazard probability Wq{At, t) in t when fixing At. Figure shows the estimated 
hazard probability for two stocks when At - 1, 5, and 10. Note that both the solid lines indicating the analytical 
solution equation ([T9l l and the markers indicating the empirical data decrease slowly and in each panel are in good 
agreement. The decreasing trend of W{At\t) is consistent with the clustering behavior in the volatility series. Note that 
the hazard probability lT(Af|f) is universal and can be used to estimate the risk in any kind of time series. 
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Figure 6: (color online). Plot of hazard probability VF(A(|t) for two stocks and At = 1, 5, and 10. (a-c) Stock 000001. (d-f) Stock 900956. 


5.2. Predicting large volatilities 

To forecase large volatility events in a volatililty series we first calculate the recurrence time for a given tq, e.g., 
tq - 100, and estimate the distribution of parameters. The hazard probability W{\\t) that an extreme event will 
occur in the next period is determined using equation (fT9l l and the estimated distribution parameters. Once the hazard 
probability breaks though the predefined threshold Qp, an alarm will be triggered, warning that a large volatility event 
is immanent. Figure |3 a) plots a subseries of the volatility values and highlights events above threshold Q in the top 
panel, which correspond to the mean recurrence time tq. The risk probability W{\\t) is shown in the bottom panel. 
Note that W{l\t) decreases as time t elapsed from the last large volatility event increases. Threshold Qp is plotted as a 
horizontal line to show the activating alarm process. By varying Qp within a [0,1] range, we obtain all pairs of (A, D). 

Figure |7Jb) shows the ROC curves for ten stocks. Note that all ten curves are above the dashed line D - A, 
indicating that our prediction is not random. The ten curves do not overlap, indicating that the accuracy of this 
prediction algorithm varies for different stocks. For the same false alarm A (where a vertical dashed line is plotted at 
A = 0.1 as an example), stock 900901 has the highest correct prediction rate and stock 300066 the lowest. We also 
calculate the correct prediction rate Da=q. i for all stocks at the false alarm level of 0.1. Figure|7jc) shows the frequency 
plots of Da=q.\. Note that the peak is centered at 0.2, 10% higher than a random prediction. We also find there are 
nine stocks with Da=o.\ > 0.4. The top three values are 0.7, 0.68, and 0.60 for stocks 000529, 000557, and 000592, 
indicating that our forecasting algorithm can accurately predict the large volatilities of these three stocks. Previous 
research has indicated that the efficiency of the algorithm is primarily influenced by the linear and nonlinear memory 
behavior in the original volatilities |01- Our results here indicate that the behavior of stocks with stronger memory 
behaviors, such as volatility clustering and multifactality, could be more accurately predicted using our algorithm. 
Our algorithm only takes into consideration the probability distribution of recurrence intervals, but if the memory 
behavior of recurrence intervals were also included we believe that its predictive accuracy would be greatly inhanced. 

6. Conclusion 

In this work, we have utilized a decision-making algorithm to forecast the occurrence of large volatilities in Chi¬ 
nese stock markets based on the hazard probability, which is derived from the distribution of recurrence intervals 
between the volatilities exceeding a threshold Q. By fitting the volatility recurrence intervals by means of five can¬ 
didate distributions and comparing their KS statistics, we have found that the volatility recurrence intervals are well 
approximated by the ^'-exponential distribution. The fitting parameter q is found to be independent of the mean recur¬ 
rence time Tq, which is at a one-to-one correspondence with the threshold Q, for all the stocks in our sample. However 
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Figure 7: (color online). Prediction of large volatilities, (a) Plots of a representative volatility series in the top panels and hazard probability VF(1|A0 
in the bottom panels, (b) Plots of ROC cui'ves for ten stocks, (c) Distribution plots of the connect prediction rates at the false alarm level of 0.1 for 
all stocks. 


the parameter of the ^-distribution does not exhibit the same behavior as q. For half of the stocks, Ax exhibits a 
strong dependence on tq. Using a moving window analysis, we have found that both parameters are influenced by 
market status and exhibit the same trend with the evolution of time. This behavior may have potential applications for 
explaining stock return volitility. 

Using the ^-distribution formula, we have derived an analytical solution of the hazard probability W{At\t) of the 
next large volatility event above the threshold Q within a short time interval Af after an elapsed time t from the last 
large volatility above Q. This analytical solution W{At\t) is in good agreement with the empirical risk probability 
derived from real stock data. We have adopted a decision-marking algorithm and have used the hazard probability to 
forecast large volatilities. At the false alarm level of 0.1, we have found that the average correct predicting rate is 0.2 
for all stocks. We have also found that there are three stocks with a correct predicting rate is greater than 0.6, indicating 
that our predicting algorithm is accurate in forecasting the large volatilities. Our findings may shed new light on our 
understanding of extreme volatility behavior and may have potential applications in managing stock market risk. 
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Appendix A. Maximum likelihood estimation of distribution parameters 


Appendix A.l. Stretched exponential distribution 

To estimate the parameters of Eq. (fT4l l. the first step is to use f{x) - 1 and xf{x) = 1 to reduce the number 
of parameters .For the first integral, we have 



f(,x)Ax = 





Let y = {bTQxY, we have x - and dx = 


libTQ 


Then, we can obtain 


(A.l) 
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For the second integral, we have 

r^cx) /-*oo 

j xf(x)dx= j xaTQe~^^^^^^dx = 1, 
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Again, we let y = (brQxY, we have x = and dx = Then, we can obtain 
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flXTee'<''^e^)"dx = f ^—aTQe-^^— -dy ^ 
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J ^co 
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“ r(4 = i. 
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Through solving the equations, the parameters a and b could be formulated by the exponent p and tq, 

pr(2/p) , r(2/p) 

a = __ . —, b — 


nupYTQ^ - t{\Ip)tq 
The likelihood function of the stretched exponential distribution can be written as 


(A.2) 


(A.3) 


(A.4) 


(A.5) 
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T = n “'^2 exp[-(feTQX;)''], (A.6) 

i 

Taking logarithm on both side, we have 

n 

\nL - n\na + n\nTQ-'^^{bTQXiY, (A.7) 

i 

By submitting Eq. (IA.5b into Eq. (IA.71 l. the log likelihood function of the stretched exponential distribution has only 
one variable p. Our purpose is to find the value of p which is associated with the maximum value of the In L. Here, 
it is very hard to obtain the expression by taking a derivative of InL with respect to p. Hence, we just estimate the 
function value of In L by changing p from 0 to 5 with a step of 10“^. We locate the p with the maximum In L as the 
solution of our maximum likelihood estimation. 


Appendix A.l. Power-law distribution with an exponential cutoff 

poo poo 

As the same way as the stretched exponential, we use fix) - 1 and x/(x) = 1 to reduce the number of 
parameters. For the first integral, we have 



/(x)dx = 
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Let y = Ictqx, we have x - -jj- and dx = 4 ■ Then, we obtain 
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For the second integral, we have 
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Again, we let y - krgx, we have x - and dx = Then, we obtain 
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(A.ll) 


Through solving the two equations, we obtain that 


4 = 12 ; 


r(-r)'!‘e 


tq I n-r) 


(A. 12) 


The likelihood function of the power-law distribution with an exponential cutoff can be written as 


=n 


L-\\ ctJx ’ e 




(A.13) 


Taking logarithm on both side, we have 


n n 

\nL - nine - ynXriTQ - {y + 1)^ Inx, - ^ kxQXi, (A. 14) 

i i 

By submitting Eq. (I A. 121) into Eq. (IA.14I) . the log likelihood function of the power-law distribution with an exponential 
distribution has only one variable y. Our purpose is to find the value of y which is associated with the maximum value 
of the In L. Here, it is very hard to obtain the expression by taking a derivative of In L with respect to y. Hence, we just 
estimate the function value of In L by changing p from -1 to 0 with a step of 10 We locate the y with the maximum 
In L as the solution of our maximum likelihood estimation. 
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